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Abstract. We consider the unweighted bipartite maximum matching problem in the one-pass 
turnstile streaming model where the input stream consists of edge insertions and deletions. In the 
insertion-only model, a one-pass 2-approximation streaming algorithm can be easily obtained with 
space O(nlogn), where n denotes the number of vertices of the input graph. We show that no such 
result is possible if edge deletions are allowed, even if space is granted, for every <5 > 0. 

Specifically, for every 0 < e < 1, we show that in the one-pass turnstile streaming model, in order 
to compute a 0(n'')-approximation, space is required for constant error randomized 

algorithms, and, up to logarithmic factors, space is sufficient. 

Our lower bound result is proved in the simultaneous message model of communication and may 
be of independent interest. 


1 Introduction 

Massive graphs are usually dynamic objects that evolve over time in structure and size. For 
example, the Internet graph changes as webpages are created or deleted, the structure of social 
network graphs changes as friendships are established or ended, and graph databases change 
in size when data items are inserted or deleted. Dynamic graph algorithms can cope with 
evolving graphs of moderate sizes. They receive a sequence of updates, such as edge insertions 
or deletions, and maintain valid solutions at any moment. However, when considering massive 
graphs, these algorithms are often less suited as they assume random access to the input graph, 
an assumption that can hardly be guaranteed in this context. Consequently, research has been 
carried out on dynamic graph streaming algorithms that can handle both edge insertions and 
deletions. 

Dynamic Graph Streams. A data streaming algorithm processes an input stream X = 
Ai,..., Xn sequentially item by item from left to right in passes while using a memory whose 
size is sublinear in the size of the input [25]. Graph streams have been studied for almost two 
decades. However, until recently, all graph streams considered in the literature were insertion- 
only, i.e., they process streams consisting of sequences of edge insertions. In 2012, Ahn, Guha and 
McGregor |2| initiated the study of dynamic graph streaming algorithms that process streams 
consisting of both edge insertions and deletions. Since then, it has been shown that a vari¬ 
ety of problems for which space-efficient streaming algorithms in the insertion-only model are 
known, such as testing connectivity and bipartiteness, computing spanning trees, computing 
cut-preserving sparsifiers and spectral sparsifiers, can similarly be solved well in small space 
in the dynamic model |2i;il20ll0j . An exception is the maximum matching problem which, as 
we will detail later, is probably the most studied graph problem in streaming settings. In the 
insertion-only model, a 2-approximation algorithm for this problem can easily be obtained in 
one pass with O(nlogn) space, where n is the number of vertices in the input graph. Even in 
the sliding-window modeQ which can be seen as a model located between the insertion-only 
model and the dynamic model, the problem can be solved well |8|. The status of the problem 

^ In the sliding-window model, an algorithm receives a potentially infinite insertion-only stream, however, only 
a fixed number of most recent edges are considered by the algorithm. Edges are seen as deleted when they are 
no longer contained in the most recent window of time. 





in the dynamic model has been open so far, and, in fact, the existence of sublinear space one- 
pass dynamic streaming algorithms for the maximum matching problem was one of the open 
problems collected at the Bertinoro 2014 workshop on sublinear algorithms 

Results on dynamic matching algorithms |6l5j show that even when the sequence of graph 
updates contains deletions, then large matchings can be maintained without too many recon¬ 
figurations. These results may give reasons for hope that constant or poly-logarithmic approxi¬ 
mations could be achieved in the one-pass dynamic streaming model. We, however, show that 
if there is such an algorithm, then it uses a huge amount of space. 

Summary of Our Results. In this paper, we present a one-pass dynamic streaming al¬ 
gorithm for maximum bipartite matching and a space lower bound for streaming algorithms in 
the turnstile model, a slightly more general model than the dynamic model (see Section for 
a discussion), the latter constituting the main contribution of this paper. We show that in one 
pass, an 0(n'^)-approximation can be computed in space (Theorem [^, and space 

^(j.j^3/2-4e) jg necessary for such an approximation (Corollary [^. 

Lower Bound via Communication Complexity. Many space lower bounds in the 
insertion-only model are proved in the one-way communication model. In the one-way model, 
party one sends a message to party two who, upon reception, sends a message to party three. 
This process continues until the last party receives a message and outputs the result. A recent 
result by Li, Nguyen and Woodruff [22] shows that space lower bounds for turnstile streaming 
algorithms can be proved in the more restrictive simultaneous model of communication (SIM 
model). In this model, the participating parties simultaneously each send a single message to a 
third party, denoted the referee, who computes the output of the protocol as a function of the 
received messages. A lower bound on the size of the largest message of the protocol is then a 
lower bound on the space requirements of a turnstile one-pass streaming algorithm. Our paper 
is the first that uses this connection in the context of graph problems. 

A starting point for our lower bound result is a work of Goel, Kapralov and Khanna m, 
and a follow-up work by Kapralov m- In |13j . via a one-way two-party communication lower 
bound, it is shown that in the insertion-only model, every algorithm that computes a (3/2 — 
e)-approximation, for e > 0, requires f2{n logiog") space. This lower bound has then been 
strengthened in m to hold for (e/(e — 1) — e)-approximation algorithms. Both lower bound 
constructions heavily rely on Ruzsa-Szemeredi graphs. A graph G is an (r, s)-Ruzsa-Szemeredi 
graph (in short: RS-graph), if its edge set can be partitioned into r disjoint induced matchings 
each of size at least s. The main argument of m can be summarized as follows: Suppose that 
the first party holds a relatively dense Ruzsa-Szemeredi graph Gi. The second party holds a 
graph G 2 whose edges render one particular induced matching M C E(Gi) of the first party 
indispensable for every large matching in the graph Gi U G2 , while all other induced matchings 
are rendered redundant. Note that as M is an induced matching, there are no alternative edges 
in Gi different from M that interconnect the vertices that are matched by M. As the first party 
is not aware which of its induced matchings is required, and as the communication budget is 
restricted, only few edges of M on average will be sent to the second party. Hence, the expected 
size of the output matching is bounded. 

When implementing the previous idea in the SIM setting, the following issues have to be 
addressed: 

Firstly, the number of parties in the simultaneous message protocol needs to be at least 
as large as the desired bound on the approximation factor. The trivial protocol where every 
party sends a maximum matching of its subgraph, and the referee outputs the largest received 
matching, shows that the approximation factor cannot be larger than the number of parties, even 

^ See also http://sublinear.info/64 
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when message sizes are as small as 0(n). Hence, proving hardness for polynomial approximation 
factors requires a polynomial number of participating parties. On the other hand, the number 
of parties can neither be chosen too large: If the input graph is equally split among p parties, 
for a large p, then the subgraphs of the parties are of size 0(n^/p). Thus, with messages of size 
0(re^/p), all subgraphs can be sent to the referee who then computes and outputs an optimal 
solution. Hence, the larger the number of parties, the weaker a bound on the message sizes can 
be achieved. 

Secondly, there is no “second party” as in the one-way setting whose edges could render one 
particular matching of every other party indispensable. Instead, a construction is required so 
that every party both has the function of party one (one of its induced matchings is indispensable 
for every large matching) and of party two (some of its edges render many of the induced 
matchings of other parties redundant). This suggests that the RS-graphs of the parties have 
to overlap in many vertices. While arbitrary RS-graphs with good properties can be employed 
for the lower bounds of m and m, we need RS-graphs with simple structure in order to 
coordinate the overlaps between the parties. 

We show that both concerns can be handled. In Section we present a carefully designed 
input distribution where each party holds a highly symmetrical RS-graph. The RS-graph of 
a party overlaps almost everywhere with the RS-graphs of other parties, except in one small 
induced matching. This matching, however, cannot be distinguished by the party, and hence, 
as in the one-way setting, the referee will not receive many edges of this matching. 

Upper Bound. Our upper bound result is achieved by an implementation of a simple 
matching algorithm in the dynamic streaming model: For an integer k, pick a random subset 
T' C ^ of size k of one bipartition of the bipartite input graph G = {A, B, E)] for each a G A', 
store arbitrary minj/c, deg(a)} incident edges, where deg(a) denotes the degree of a in the input 
graph; output a maximum matching in the graph induced by the stored edges. We prove that 
this algorithm has an approximation factor of n/k. In order to collect k incident edges of a 
given vertex in the dynamic streaming model, we employ the /o-samplers of Jowhari, Saglam, 
Tardos [l6], which have previously been used for dynamic graph streaming algorithms m- By 
chosing k = this construction leads to a 0(n^)-approximation algorithm with space 

While this algorithm in itself is rather simple and standard, it shows that non-trivial 
approximation ratios for maximum bipartite matching in the dynamic streaming model are 
possible with sublinear space. Our upper and lower bounds show that in order to compute a 
n'^-approximation, space is sufficient and space is required. Improving on 

either side is left as an open problem. 

Further Related Work. Matching problems are probably the most studied graph problem 
in the streaming model fT 2 ll 2 ;-il 9 ll()liy 2 l 21 l 26 ll 3 ll 7 ll 4 l 8 y 7 ll 9 l 24 ll 8 lllj . Closest to our work are 
the already mentioned lower bounds m and m- Their arguments are combinatorial and so are 
the arguments in this paper. Note that lower bounds for matching problems in communication 
settings have also been obtained via information complexity in unis!. 

In the dynamic streaming model, Ahn, Guha, and McGregor [2] provide a multi-pass algo¬ 
rithm with poly e“^) space, 0{p ■ e~^ ■ loge“^) passes, and approximation factor 1 -|- e 

for the weighted maximum matching problem, for a parameter p. This is the only result on 
matchings known in the dynamic streaming setting. 

Recent Related Work. Assadi et. al. [1] independently and concurrently to this work 
essentially resolve the questions asked in this paper. Using the same techniques (^o-sampling for 
the upper bound, simultaneous communication complexity and Rusza-Szemeredi graphs for the 
lower bound), they show that there is a 0(n'^)-approximation dynamic streaming algorithm for 
maximum matching which uses space. Furthermore, they prove that this is essentially 
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tight for turnstile algorithms; Any such algorithm in the turnstile model requires space at least 

^2-3e-o(1)^ 

Outline. We start our presentation with a section on preliminaries. Then, in Section we 
present our hard input distribution which is then used in Section in order to prove our lower 
bound in the SIM model. Finally, we conclude with our upper bound in Section 

2 Preliminaries 

For an integer o > 1, we write [a] for {!,... ,a}. We use the notation 0(), which equals the 
standard 0() notation where all poly-logarithmic factors are ignored. 

Simultaneous Communication Complexity. Let G = {A,B,E) denote a simple bipartite 
graph, and, for an integer P > 2, let Gi,... ,Gp be edge-disjoint subgraphs of G. In the 
simultaneous message complexity setting, for p £ [P], party p is given Gp, and sends a single 
message pp of limited size to a third party denoted the referee. Upon reception of all messages, 
the referee outputs a matching M in G. Note that the participating parties cannot communicate 
with each other, but they have access to an infinite number of shared random coin flips which 
can be used to synchronize their messages. 

We say that an algorithm/protocol is a constant error algorithm/protocol if it errs with 
probability at most e, for 0 < e < 1/2. We also assume that a algorithm/protocol never outputs 
edges that do not exist in the input graph. 

Turnstile streams. For a bipartite graph G = {A,B,E), let X = Xi, X 2 , ■ ■ ■ be the input 
stream with Xi £ E x {-|-1, —1}, where -|-1 indicates that an edge is inserted, and —1 indicates 
that an edge is deleted. Edges could potentially be inserted multiple times, or be removed before 
they have been inserted, as long as once the stream has been fully processed, the multiplicity 
of an edge is in {—c, —c -|- 1,..., c — 1, c}, for some integer c. The reduction of [22j and hence 
our lower bound holds for algorithms that can handle this type of dynamic streams, also known 
as turnstile streams. Such algorithms may for instance abort if negative edge multiplicities are 
encountered, or they output a solution among the edges with non-zero multiplicity. 

In [22] it is shown that every turnstile algorithm can be seen as an algorithm that solely 
computes a linear sketch of the input stream. As linear sketches can be implemented in the 
SIM model, lower bounds in the SIM model are lower bounds on the sketching complexity 
of problems, which in turn imply lower bounds for turnstile algorithms. We stress that our 
lower bound holds for linear sketches. Note that all known dynamic graph algorithm^ solely 
compute linear sketches (e.g. HEEnnsj). This gives reasons to conjecture that also all dynamic 
algorithms can be seen as linear sketches, and, as a consequence, our lower bound not only holds 
for turnstile algorithms but for all dynamic algorithms. 

3 Hard Input Distribution 

In this section, we construct our hard input distribution. First, we describe the construction of 
the distribution from a global point of view in Subsection 3.1. Restricted to the input graph Gp 
of any party p £ [P], the distribution of Gp can be described by a different construction which 
is simpler and more suitable for our purposes. This will be discussed in Subsection 3.2. 

3.1. Hard Input Distribution: Global View Denote by P the number of parties of 
the simultaneous message protocol. Let k,Q be integers so that P < k < ^, and Q = o{P). 
The precise values of k and Q will be determined later. First, we define a bipartite graph 
G' = (A, B, E) on 0(n) vertices with A = B = [{Q + P)k] from which we obtain our hard input 

^ Some of those algorithms couldn’t handle arbitrary turnstile streams as they rely on the fact that all edge 
multiplicities are in {0,1}. 
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distribution. For 1 < i < Q + P, lei Ai = [1 + {i — l)k,ik] and let Bi = [1 + {i — l)k,ik]. The 
edge set E is a collection of matchings as follows: 

E= U MPU U U 


where Mij is a perfect matching between Ai and Bj, and Mjp ... are P edge-disjoint 

perfect matchings between Ai and Bj. Note that as we required that k > P, the edge-disjoint 
matchings Mlj, ..., Mf- can be constructecQ 

From G\ we construct the input graphs of the different parties as follows: 


1. For every p G [P], let G'p = {A, B, Ep) where Ep consists of the matchings Mf- for i,j G [Q], 

the matching and the matchings Mq^pj and Mj^q+p for j G [Q] . 

2. For every p G [P], for every matching M of Gp, pick a subset of edges of size k/2 from M 
uniformly at random and replace M by this subset. 

3. Pick random permutations tta, ■ [Q + P] [QpP]- Permute the vertex IDs of the graphs 

Gp, for 1 < p < P, so that if '/ryi(i) = j then Ai receives the IDs of Aj as follows: The vertices 
ai = 1 -|- k{i — 1), 02 = 2 + k{i — 1),..., o^ = /cz receive new IDs so that after the change 
of IDs, we have oi = 1 -|- k{j — l),a 2 = 2 + k{j — 1),..., Ofc = kj. The same procedure is 
carried out with vertices Bi and permutation ttb- Denote by Gp the graph Gp once half of 
the edges have been removed and the vertex IDs have been permuted. Let G be the union 
of the graphs Gp. 

The structure of G' and a subgraph Gp is illustrated in Figure 



Fig. 1. Left: Graph G' . A vertex corresponds to a group of k vertices. Each edge indicates a perfect matching 
between the respective vertex groups. The bold edges correspond to the matchings for 1 < p < P, 

the solid edges correspond to matchings Mfp for 1 < i, j < Qi 1 < P < P) and the dotted edges correspond to 
matchings Mg+p^i, Mi^g+p, for 1 < i < Q and 1 < p < P. Right: Subgraph G'p C G. 


Properties of the input graphs. Graph G' has a perfect matching of size {Q + P)k which 
consists of a perfect matching between vertices Ai,..., Aq and Bi,..., Bq, and the matchings 
Mg+p^g+p for 1 < p < P. As by Step 2 of the construction of the hard instances, we remove half 
of the edges of every matching, a maximum matching in graph G is of size at least ^ ^ • Note 

that while there are many possibilities to match the vertex groups Ai,..., Aq and Pi,..., Bq, 
in every large matching, many vertices of Ag+j are matched to vertices Pg+i using edges from 
the matching Mg+j^g+j. For some p G [P], consider now the graph Gp from which the graph Gp 
is constructed. Gp consists of perfect matchings between the vertex groups Ai and Bj for every 
hJ £ [Q] U {p}. In graph Gp, besides the fact that only half of the edges of every matching are 
kept, the vertex IDs are permuted. We will argue that due to the permuted vertices, given Gp, 
it is difficult to determine which of the matchings corresponds to the matching Mg+p^g+p in 
G' . Therefore, if the referee is able to output edges from the matching Mg+p^g+p, then many 
edges from every matching have to be included into the message pp sent by party p. 

3.2. Hard Input Distribution: Local View. From the perspective of an individual party, 
by symmetry of the previous construction, the distribution from which the graph Gp is chosen 
can also be described as follows: 

^ For instance, define G' so that G'\a,\jb, is a P-regular bipartite graph. It is well-known (and easy to see via 
Hall’s theorem) that any P-regular bipartite graph is the union of P edge-disjoint perfect matchings. 
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1. Pick Ia,Ib ^ [Q + P] so that \Ia\ = \Ib\ = Q + l uniformly at random. 

2. For every i & Ia and j G Ib, introduce a matching of size k/2 between Ai and Bj chosen 
uniformly at random from all possible matching between Ai and Bj of size k/2. 


Gp can be seen as a {{Q + 1)^, A:/2)-Ruzsa-SzemerMi graph or as a (Q + 1, k{Q + l)/2)-Ruzsa- 
Szemeredi graph. Let Gp denote the possible input graphs of party p. We prove now a lower 
bound on \Gp\- 


Lemma 1. There are at least \Gp\ > 

party p. Moreover, the input distribution is uniform. 


possible input graphs for every 


Proof. The vertex groups I a and Ib are each of cardinality Q+1 and chosen from the set [Q+P\. 
There are (q+^) choices for I a- Consider one particular choice of I a- Then, there are 
possibilities to pair those with Q + 1 vertex groups of the B nodes. Each matching is a subset 

of k/2 edges from k potential edges. Hence, there are (q+^) (i\) ^ input graphs for 

each party. Using a bound on the central binomial coefficient, this term can be bounded from 

below by ^ ° 

The matching in Gp that corresponds to the matching between and Hg+p in G/ will play 

an important role in our argument. In the previous construction, every introduced matching in 
Gp plays the role of matching Mg+p^g+p in G/ with equal probability. In the following, we will 
denote by Mp the matching in Gp that corresponds to the matching Mg+p^g+p in G'p. 


4 Simultaneous Message Complexity Lower Bound 

We prove now that no communication protocol with limited maximal message size performs well 
on the input distribution described in Section First, we focus on deterministic protocols, and 
we prove a lower bound on the expected approximation ratio (over all possible input graphs) 
of any deterministic protocol (Theorem [^. Then, via an application of Yao’s lemma, we obtain 
our result for randomized constant error protocols (Theorem]^. Our lower bound for dynamic 
one-pass streaming algorithms. Corollary is then obtained as a corollary of Theorem and 
the reduction of j22| . 

Lower Bound For Deterministic Protocols. Consider a deterministic protocol that runs on a 
hard instance graph G and uses messages of length at most s. As the protocol is deterministic, 
for every party p G [P], there exists a function mp that maps the input graph Gp of party p to 
a message pp. As the maximum message length is limited by s, there are 2^ different possible 
messages. Our parameters Q, k will be chosen so that s is much smaller than the number of input 
graphs Gp for party p, as stated in Lemma Consequently, many input graphs are mapped to 
the same message. 

Consider now a message pp and denote by Pp^ the set of graphs Gp that are mapped by mp 
to message pp. Upon reception of pp, the referee can only output edges that are contained in 
every graph of Pp^, since all outputted edges have to be contained in the input graph. 

Let N denote the matching outputted by the referee, and let Np = N (1 Mp denote the 
outputted edges from matching Mp. Furthermore, for a given message pp, denote by Gfj,^ := 
Hcpe/ip ^ ^P- 

In the following, we will bound the quantity E|Ap| from above (Lemma [^. By linearity 
of expectation, this allows us to argue about the expected number of edges of the matchings 
UpNp outputted by the referee. We can hence argue about the expected size of the outputted 
matching, which in turn implies a lower bound on the approximation guarantee of the protocol 
(Theorem Q . 
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Lemma 2. For every party p € [P], we have E|A^p| = O j • 

Proof. Let P denote the set of potential messages from party p to the referee. As the maximum 
message length is bounded by s, we have \P\ < 2®. Let V = be a parameter which splits the 
set P into two parts as follows. Denote hy r> P P the set of messages //p so that |^p^| > V, 
and let = D\/>. In the following, for a message /ip G P, we denote by P [/ip] the probability 
that message iXp is sent by party p. Note that Ylfip^rc ^ ih-p] < since there are at most 2^V 
input graphs that are mapped to messages in D<. We hence obtain: 

K|iVp|< j;P[/ip]E|GpJ= (P [/ip] E|G^J)+ (P[/ip]E|GpJ) 

fipGP {ipGP> 

\ / fip^Fc^ 

2^V 

< max{E|G^p| : /ip G P>} + j^k = max{E|Gpp| : /ip G P>} + 1, 

\yp\ 

where we used the definition of V for the last equality. In Lemma we prove that V/ip G P> : 
~ O implies the result. □ 

Lemma 3. Suppose pLp is so that \ia~^\ >V= Then, E|G^p| = O • 

Proof. Remember that every graph Gp G /ip ^ consists of {Q + 1)^ edge-disjoint matchings, and 
Mp is a randomly chosen one of those. We define 


h — {(b j) G [<5 + P*] X [Q + P*] : G/ipU iijBj contains a matching of size 1}. 

We prove first that if \Ii\ is large, then /i“^ is small. 

Claim. Let I = o{k). Then, [//[ >x^ < (qX^)^ ‘ 

Proof. Every graph of contains I edges of x (fixed) matchings. The remaining edges and 
remaining matchings can be arbitrarily chosen. Then, by a similar argument as in the proof of 
Lemma we obtain 

, - 1 , / fQ + p - x\{Q + p - x)\ f k - ly f 

\^p\-[q + i-x) (p-1)! [y-ij [yj 

(Q + P\ (Q + P)! 3 , 

^ VQ + V {y:J 

where we used {ijfli) = the bound < (f)^(ifc) (remember: I = o{k)) which 

follows from Lemma (see Appendix). □ 

Then, we can bound: 


< -pEiy • ^ + (1 - < pTiy' ^ + 

Note that by assumption, we have /r”^ > V. Let l,x be two integers so that: 

fQ + P\ {Q + P)l 3 ^ 

VQ + iy (P-1)!^4^ 


( 1 ) 


( 2 ) 
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Then, by the previous claim, we obtain |/;| < x. Solving Equality for variable x, and further 
bounding it yields: 


X < j { {Q + - - logk) + log 


+ {Q + P)l \ 

Kq + iJ {p-iy.J 



( 3 ) 


Remember that V was chosen as E = and hence logE > (Q + P)‘^{k — log(A; + 1)) + 
log — s — log(fe). Using this bound in Inequality 3 yields 


1 

"^7 

1 


< - 
- I 


(^{Q + l)^(log(fc + 1) - ^ log A:) + s - log kj 
((Q + lf(log(k + l)) + s) . 


Now, using |/;| < X and the previous inequality on x, we continue simplifying Inequality as 
follows: 


E|G^| < ••• < 


\Ii\ 


{Q + i? 


^ log(fc + 1)A: 


■ k + I < 
sk 

l{Q + lf 


{Q + l)^(log(/c + 1)) + s 


+ l = 0{ 


l{Q + lf 

sk 


■k + l 


/(g + l)2 


+ 0) 


since s = uj{{Q + 1)^ log(A: + 1)). We optimize by choosing I 



and we conclude E|G^| = 


□ 


Theorem 1. For any P < y/n, let be a P-party deterministic simultaneous message 

protocol for maximum matching where all messages are of size at most s. Then, P^gf has an 

expected approximation factor of Q 



Proof. For every matching M' in the input graph G, the size of M' can be bounded by \M'\ < 
2Qk + \M' n Mp\, since at most 2Qk edges can be matched to the vertices of the vertex 

groups UiefQ] U Rj, and the edges of matchings Mp are the only ones not incident to any 
vertex in Uie[Q] ^7 U Rj. Hence, by linearity of expectation, and the application of Lemma 
we obtain: 


p 

E|A^| < 2QA; + ^E|lVp| < 2gA: + P • O 

p=i 



( 4 ) 


A maximum matching in G is of size at least hence obtain the expected approxi¬ 

mation factor: 


E 


^k{Q + P) ^ lk{Q + P) 


|iV| 


E|1V| 


= G 


k{Q + P) 


(q 


k + P 


y/sk \ 

Q ). 


= Q 


' {Q + P)QVk \ 
Q'^Vk -\- Py/s J 


= Q 


PQVk 


Q'^Vk Py/s 


/ P QVk 


( 5 ) 


where the first inequality follows from Jensen’s 
The previous expression is maximized for Q 


inequality, and the third equality uses Q = o(P). 




1/2 


, and we obtain an approximation 
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factor of f2 


^. In turn, this expression is maximized when k is as large as possible, that 

is, k = n/P (remember that the possible range for k is P < k <n/P). We hence conclude that 

1 

the approximation factor is □ 

Lower Bound for Randomized Protocols. Last, in Theorem(proof in appendix), we extend 
our determinstic lower bound to randomized ones. 

Theorem 2. For any P < y/n, let P^pd^fKi be a P-party randomized simultaneous message 
protocol for maximum matching with error at most e < 1/2, and all messages are of size at 

most s. Then, has an approximation factor of Q 

Our lower bound for one-pass turnstile algorithms now follows from the reduction given in 
|22j and the application of Theorem for P = \fn. 

Corollary 1. For every 0 < e < 1, every randomized constant error turnstile one-pass stream¬ 
ing algorithm for maximum bipartite matching with approximation ratio uses space f2 

5 Upper Bound 



1 1 
1 


Algorithm 1 Bipartite Matching algorithm 
Require: G = {A, B, E) {Bipartite input graph} 

1: A't— subset of A of size k chosen uniformly at random 

2: 'ia £ A' ■. E'[a\ £- arbitrary subset of incident edges of a of size min{fc, deggla)} 
3: return maximum matching in UagA' 


In this section, we first present a simple randomized algorithm for bipartite matching. Then, 
we will discuss implementations of this algorithm as a simultaneous message protocol and as a 
dynamic one-pass streaming algorithm. 

Bipartite Matching Algorithm. Consider Algorithm First, a subset A' G A consisting of 
k vertices is chosen uniformly at random. Then, for each vertex a G A/ the algorithm picks 
arbitrary k incident edges. Finally, a maximum matching among the retained edges is computed 
and returned. 

Clearly, the algorithm stores at most k'^ edges. The proof of the next lemma concerning the 
approximation ratio of Algorithm is deferred to the appendix. 

Lemma 4. Let G = {A,B,E) be a bipartite graph with |A| -|- \B\ = n. Then, Algorithm^ has 
an expected approximation ratio of ^. 

Notations. In the proof of Lemma]^ we use the following additional notation. Let G = {A, B, E) 
be a bipartite graph. For a set of edges E' C E, we denote by A{E') the subset of A vertices a 
for which there exists at least one edge in E' incident to a. The set B{E') is defined similarly. 

Proof. Let M denote the output of the algorithm, let M* be a maximum matching in G, and 
let E' = UaeA' Let A'* = A' n A{M*). As A' has been chosen uniformly at random, we 

have E|A'*| = \Yg prove now that the algorithm can match all vertices in A'*. This 

then implies the result, as \M\ > |A'*|, and 


E 


\M* 

W\ 



fe|A(M*)| k ~ ' 
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where we used \A{M*)\ = \M*\. 

To this end, we construct a matching M' that matches all vertices of A'*. Let A'^ C A'* so 
that for every a G A'^, the incident edge of a in M* has been retained by the algorithm. Denote 
by Ml C M* the subset of optimal edges incident to the vertices A'^. Then, let A '2 = A'* \ A'^. 
Consider now the graph G on vertices Al^ and B \ B{Mi) and edges 

{e ^ E' : e = (a, b) with a G A 2 and b £ B \ B{Mi)}. 

Note that as for every vertex a G A 2 , its optimal incident edge has not been retained, k 
different edges have been retained (which also implies that the degree of o in G is at least k). 
Therefore, the degree of every a G G is at least k — \ B{Mi)\ = k — \ Mi\. Furthermore, note that 
1^2*1 = k — |T(*| = k — \Mi\. Thus, by Hall’s marriage theorem, there exists a matching M 2 in 
G matching all vertices and hence, IM 2 I = k — \Mi\. 

We set M' = M 1 UM 2 and all vertices of A'* are matched. We obtain \M'\ = |Mi| + IM 2 I = k, 
and the result follows. □ 

Implementation of Algorithm as a Simultaneous Message Protocol. Algorithm can be im¬ 
plemented in the simultaneous message model as follows. Using shared random coins, the P 
parties agree on the subset A' C. A. Then, for every a £ A', every party chooses arbitrary 
minjdegQ. (a), fc} edges incident to a and sends them to the referee. The referee computes a 
maximum matching in the graph induced by all received edges. As the referee receives a su¬ 
perset of the edges as described in Algorithm the same approximation factor as in Lemma 
holds. We hence obtain the following theorem: 

Theorem 3. For every P > 1, there is a randomized P-party simultaneous message protocol 
for maximum matching with expected approximation factor n“ and all messages are of size 

6(n2-2«). 

Implementation of Algorithm\^ as a Dynamic Streaming Algorithm. We employ the technique of 
Iq sampling in our algorithm m- For a turnstile stream that describes a vector x, a /o-sampler 
samples uniformly at random from the non-zero coordinates of x. Similar to Ahn, Guha, and 
McGregor [2], we employ the /o-sampler by Jowhari et al. |16) . Their result can be summarized 
as follows: 

Lemma 5 (fTBj). There exists a turnstile streaming algorithm that performs lo-sampling using 
space 0(log^ nlog<5“^) with error probability at most 6. 

In order to implement Algorithm in the dynamic streaming setting, for every a £ A', we 
use enough /o-samplers on the sub-stream of incident edges of a in order to guarantee that with 
large enough probability, at least min{A:, deg(^(a)} different incident edges of a are sampled. It 
can be seen that, for a large enough constant c, c- klogn samplers are enough, with probability 
1 — . We make use of the following lemma whose proof is deferred to the appendix. 

Lemma 6. Let S be a finite set, k an integer, and c a large enough constant. When sampling 
c ■ klogn times from S, then with probability 1 — , at least min{fc, |5|} different elements 

of S have been sampled. 

This allows us to conclude with the main theorem of this section. 

Theorem 4. There exists a one-pass randomized dynamic streaming algorithm for maximum 
bipartite matching with expected approximation ratio n“ using space 0(n^“^"). 
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A Auxiliary Lemma 

Lemma 7. For positive integers a, 6, c so that c < a — b, the following holds: 

/a - /a\ (g - c)^ 

V c J ~ \cj {a-bf 

Proof. 

— {a — b)l ^ a! {a — c)^ / a\ (a — c)^ 

\ c J (a — b — c)lcl ~ (a — c)lc! (a — b)^ \cj (a — b)^' 

□ 


B Missing Proofs 

B.l Missing Proof of Theorem 

Theorem For any P < \/n, let ^ P-party randomized simnltaneous message 

protocol for maximum matching with error at most e, and all messages are of size at most s. 

Then, has an approximation factor of Q 

Proof. Let Ppa,nd ^ P-party randomized simultaneous message protocol for maximum match¬ 
ing with error probability at most e < 1/2 and approximation factor a. Then, by Yao’s lemma, 
there exists a deterministic protocol with approximation ratio a, distributional error e, 

and messages of length at most s. 

Consider the input distribution as described in Section let Q denote all possible input 
graphs, and for a graph G £ Q, denote by Nq the matching outputted by Pf|g^- Furthermore, 
let Qe '^Q denote those inputs on which P^g^; errs. 

A maximum matching in G is of size at least As the approximation factor is a, we 

have for every G £ G\Qe. |A^g| > Hence, 

lEcesl^Gl = (1 - e) • lEceeJA’cl -h e • ^Geg\gJ^G\ > (1 - e) 

From Equation from the proof of Theorem 0 we obtain Ecegl Ag| < 2Qk + P ■ O , and 

hence 





, implying a = ^2 


kjQ + P) 

Qk + P-^ 


Note that this term coincides with the term in Inequality!^ of the proof of Theorem[^ Optimizing 


similarly {k = n/P,Q = ^ we obtain a = i? (^) ‘ 


□ 


B.2 Missing Proof of Lemma 

Lemma Let 5 be a finite set, k an integer, and c a large enough constant. When sampling 
c • klogn times from S, then with probability 1 — at least min{fc, |5'|} different elements 

of S have been sampled. 
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Proof. We consider the following cases: 

1. Suppose that |5| = k. Then, we have an instance of the coupon collector’s problem. The 

expected number of times an item s G 5 is sampled is ^ • c • fclogn = clogn. Then, by a 
Chernoff bound, the probability that s is not sampled is and using the union bound, 

the probability that there exists at least one element from S that has not been sampled is 
1 

2. Suppose that |5| < k. This case is clearly easier than the case |5| = k, as fewer elements 
have to be sampled (only |5| instead of k) and the sampling probability for an element is 
higher. Therefore, the error probability is smaller than in Case 1. 

3. Suppose now that |5| > k. This case is also easier than the case |5| = k, since the same 
total number of different samples is required, and the domain from which the samples are 
chosen from is larger (|S'| instead of k). Therefore, the error probability is also smaller than 
in Case 1. 

□ 
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